Automated dimensionality reduction of data warehouses
نویسندگان
چکیده
A data warehouse is designed to consolidate and maintain all attributes that are relevant for the analysis processes. Due to the rapid increase in the size of the modern operational systems, it becomes neither practical, nor necessary to load and maintain in the data warehouse every operational attribute. This paper presents a novel methodology for automated selection of the most relevant independent attributes in a data warehouse. The method is based on the information-theoretic approach to knowledge discovery in databases. Attributes are selected by a stepwise forward procedure aimed at minimizing the uncertainty in the values of key performance indicators (KPI’s). Each selected attribute is assigned a score, expressing its degree of relevance. Using the method does not require any prior expertise in the domain of the data and it can be equally applied to nominal and ordinal attributes. An attribute will be included in a data warehouse schema, if it is found as relevant to at least one KPI. We demonstrate the applicability of the method by reducing the dimensionality of a direct marketing database.
منابع مشابه
2D Dimensionality Reduction Methods without Loss
In this paper, several two-dimensional extensions of principal component analysis (PCA) and linear discriminant analysis (LDA) techniques has been applied in a lossless dimensionality reduction framework, for face recognition application. In this framework, the benefits of dimensionality reduction were used to improve the performance of its predictive model, which was a support vector machine (...
متن کاملPCA-Based Out-of-Sample Extension for Dimensionality Reduction
Dimensionality reduction methods are very common in the field of high dimensional data analysis, where the classical analysis methods are inadequate. Typically, algorithms for dimensionality reduction are computationally expensive. Therefore, their applications to process data warehouses are impractical. It is visible even more when the data is accumulated non-stop. In this paper, an out-of-sam...
متن کاملA Monte Carlo-Based Search Strategy for Dimensionality Reduction in Performance Tuning Parameters
Redundant and irrelevant features in high dimensional data increase the complexity in underlying mathematical models. It is necessary to conduct pre-processing steps that search for the most relevant features in order to reduce the dimensionality of the data. This study made use of a meta-heuristic search approach which uses lightweight random simulations to balance between the exploitation of ...
متن کاملانجام یک مرحله پیش پردازش قبل از مرحله استخراج ویژگی در طبقه بندی داده های تصاویر ابر طیفی
Hyperspectral data potentially contain more information than multispectral data because of their higher spectral resolution. However, the stochastic data analysis approaches that have been successfully applied to multispectral data are not as effective for hyperspectral data as well. Various investigations indicate that the key problem that causes poor performance in the stochastic approaches t...
متن کاملImpact of linear dimensionality reduction methods on the performance of anomaly detection algorithms in hyperspectral images
Anomaly Detection (AD) has recently become an important application of hyperspectral images analysis. The goal of these algorithms is to find the objects in the image scene which are anomalous in comparison to their surrounding background. One way to improve the performance and runtime of these algorithms is to use Dimensionality Reduction (DR) techniques. This paper evaluates the effect of thr...
متن کامل